AITopics | identity group

Collaborating Authors

identity group

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Use-Case Specific Dataset for Measuring Dimensions of Responsible Performance in LLM-generated Text

Sagae, Alicia, Lee, Chia-Jung, Avula, Sandeep, Dang, Brandon, Murdock, Vanessa

arXiv.org Artificial IntelligenceOct-24-2025

Current methods for evaluating large language models (LLMs) typically focus on high-level tasks such as text generation, without targeting a particular AI application. This approach is not sufficient for evaluating LLMs for Responsible AI dimensions like fairness, since protected attributes that are highly relevant in one application may be less relevant in another. In this work, we construct a dataset that is driven by a real-world application (generate a plain-text product description, given a list of product features), parameterized by fairness attributes intersected with gendered adjectives and product categories, yielding a rich set of labeled prompts. We show how to use the data to identify quality, veracity, safety, and fairness gaps in LLMs, contributing a proposal for LLM evaluation paired with a concrete resource for the research community.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3746252.3761642

2510.20782

Country: North America > United States (0.70)

Genre: Research Report (0.50)

Industry: Information Technology (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Classification is a RAG problem: A case study on hate speech detection

Willats, Richard, Pennington, Josh, Mohan, Aravind, Vidgen, Bertie

arXiv.org Artificial IntelligenceAug-11-2025

Robust content moderation requires classification systems that can quickly adapt to evolving policies without costly retraining. We present classification using Retrieval-Augmented Generation (RAG), which shifts traditional classification tasks from determining the correct category in accordance with pre-trained parameters to evaluating content in relation to contextual knowledge retrieved at inference. In hate speech detection, this transforms the task from "is this hate speech?" to "does this violate the hate speech policy?" Our Contextual Policy Engine (CPE) - an agentic RAG system - demonstrates this approach and offers three key advantages: (1) robust classification accuracy comparable to leading commercial systems, (2) inherent explainability via retrieved policy segments, and (3) dynamic policy updates without model retraining. Through three experiments, we demonstrate strong baseline performance and show that the system can apply fine-grained policy control by correctly adjusting protection for specific identity groups without requiring retraining or compromising overall performance. These findings establish that RAG can transform classification into a more flexible, transparent, and adaptable process for content moderation and wider classification problems.

classification, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.06204

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.89)

Add feedback

Unequal Voices: How LLMs Construct Constrained Queer Narratives

Ghosal, Atreya, Gupta, Ashim, Srikumar, Vivek

arXiv.org Artificial IntelligenceJul-22-2025

One way social groups are marginalized in discourse is that the narratives told about them often default to a narrow, stereotyped range of topics. In contrast, default groups are allowed the full complexity of human existence. We describe the constrained representations of queer people in LLM generations in terms of harmful representations, narrow representations, and discursive othering and formulate hypotheses to test for these phenomena. Our results show that LLMs are significantly limited in their portrayals of queer personas.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.15585

Country: Asia > Middle East (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

ModelCitizens: Representing Community Voices in Online Safety

Suvarna, Ashima, Chance, Christina, Naranjo, Karolina, Palangi, Hamid, Hao, Sophie, Hartvigsen, Thomas, Gabriel, Saadia

arXiv.org Artificial IntelligenceJul-10-2025

Automatic toxic language detection is critical for creating safe, inclusive online spaces. However, it is a highly subjective task, with perceptions of toxic language shaped by community norms and lived experience. Existing toxicity detection models are typically trained on annotations that collapse diverse annotator perspectives into a single ground truth, erasing important context-specific notions of toxicity such as reclaimed language. To address this, we introduce MODELCITIZENS, a dataset of 6.8K social media posts and 40K toxicity annotations across diverse identity groups. To capture the role of conversational context on toxicity, typical of social media posts, we augment MODELCITIZENS posts with LLM-generated conversational scenarios. State-of-the-art toxicity detection tools (e.g. OpenAI Moderation API, GPT-o4-mini) underperform on MODELCITIZENS, with further degradation on context-augmented posts. Finally, we release LLAMACITIZEN-8B and GEMMACITIZEN-12B, LLaMA- and Gemma-based models finetuned on MODELCITIZENS, which outperform GPT-o4-mini by 5.5% on in-distribution evaluations. Our findings highlight the importance of community-informed annotation and modeling for inclusive content moderation. The data, models and code are available at https://github.com/asuvarna31/modelcitizens.

annotator, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2507.05455

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Out of Sight Out of Mind, Out of Sight Out of Mind: Measuring Bias in Language Models Against Overlooked Marginalized Groups in Regional Contexts

Elsafoury, Fatma, Hartmann, David

arXiv.org Artificial IntelligenceApr-18-2025

We know that language models (LMs) form biases and stereotypes of minorities, leading to unfair treatments of members of these groups, thanks to research mainly in the US and the broader English-speaking world. As the negative behavior of these models has severe consequences for society and individuals, industry and academia are actively developing methods to reduce the bias in LMs. However, there are many under-represented groups and languages that have been overlooked so far. This includes marginalized groups that are specific to individual countries and regions in the English speaking and Western world, but crucially also almost all marginalized groups in the rest of the world. The UN estimates, that between 600 million to 1.2 billion people worldwide are members of marginalized groups and in need for special protection. If we want to develop inclusive LMs that work for everyone, we have to broaden our understanding to include overlooked marginalized groups and low-resource languages and dialects. In this work, we contribute to this effort with the first study investigating offensive stereotyping bias in 23 LMs for 270 marginalized groups from Egypt, the remaining 21 Arab countries, Germany, the UK, and the US. Additionally, we investigate the impact of low-resource languages and dialects on the study of bias in LMs, demonstrating the limitations of current bias metrics, as we measure significantly higher bias when using the Egyptian Arabic dialect versus Modern Standard Arabic. Our results show, LMs indeed show higher bias against many marginalized groups in comparison to dominant groups. However, this is not the case for Arabic LMs, where the bias is high against both marginalized and dominant groups in relation to religion and ethnicity. Our results also show higher intersectional bias against Non-binary, LGBTQIA+ and Black women.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.12767

Country:

Europe (1.00)
Asia > Middle East (1.00)
Africa (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Health & Medicine (1.00)
Government > Regional Government (1.00)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Do LLMs exhibit demographic parity in responses to queries about Human Rights?

Javed, Rafiya, Kay, Jackie, Yanni, David, Zaini, Abdullah, Sheikh, Anushe, Rauh, Maribeth, Comanescu, Ramona, Gabriel, Iason, Weidinger, Laura

arXiv.org Artificial IntelligenceFeb-26-2025

This research describes a novel approach to evaluating hedging behaviour in large language models (LLMs), specifically in the context of human rights as defined in the Universal Declaration of Human Rights (UDHR). Hedging and non-affirmation are behaviours that express ambiguity or a lack of clear endorsement on specific statements. These behaviours are undesirable in certain contexts, such as queries about whether different groups are entitled to specific human rights; since all people are entitled to human rights. Here, we present the first systematic attempt to measure these behaviours in the context of human rights, with a particular focus on between-group comparisons. To this end, we design a novel prompt set on human rights in the context of different national or social identities. We develop metrics to capture hedging and non-affirmation behaviours and then measure whether LLMs exhibit demographic parity when responding to the queries. We present results on three leading LLMs and find that all models exhibit some demographic disparities in how they attribute human rights between different identity groups. Futhermore, there is high correlation between different models in terms of how disparity is distributed amongst identities, with identities that have high disparity in one model also facing high disparity in both the other models. While baseline rates of hedging and non-affirmation differ, these disparities are consistent across queries that vary in ambiguity and they are robust across variations of the precise query wording. Our findings highlight the need for work to explicitly align LLMs to human rights principles, and to ensure that LLMs endorse the human rights of all groups equally.

demographic parity, human right, query, (10 more...)

arXiv.org Artificial Intelligence

2502.19463

Country:

Europe > Greece > Attica > Athens (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Sri Lanka (0.04)
(17 more...)

Genre: Research Report > New Finding (0.34)

Industry: Law > Civil Rights & Constitutional Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns

Shen, Xinyue, Wu, Yixin, Qu, Yiting, Backes, Michael, Zannettou, Savvas, Zhang, Yang

arXiv.org Artificial IntelligenceJan-28-2025

Large Language Models (LLMs) have raised increasing concerns about their misuse in generating hate speech. Among all the efforts to address this issue, hate speech detectors play a crucial role. However, the effectiveness of different detectors against LLM-generated hate speech remains largely unknown. In this paper, we propose HateBench, a framework for benchmarking hate speech detectors on LLM-generated hate speech. We first construct a hate speech dataset of 7,838 samples generated by six widely-used LLMs covering 34 identity groups, with meticulous annotations by three labelers. We then assess the effectiveness of eight representative hate speech detectors on the LLM-generated dataset. Our results show that while detectors are generally effective in identifying LLM-generated hate speech, their performance degrades with newer versions of LLMs. We also reveal the potential of LLM-driven hate campaigns, a new threat that LLMs bring to the field of hate speech detection. By leveraging advanced techniques like adversarial attacks and model stealing attacks, the adversary can intentionally evade the detector and automate hate campaigns online. The most potent adversarial attack achieves an attack success rate of 0.966, and its attack efficiency can be further improved by $13-21\times$ through model stealing attacks with acceptable attack performance. We hope our study can serve as a call to action for the research community and platform moderators to fortify defenses against these emerging threats.

detector, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2501.1675

Country:

North America > United States > Alaska (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Target-Aware Analysis of Data Augmentation for Hate Speech Detection

Casula, Camilla, Tonelli, Sara

arXiv.org Artificial IntelligenceOct-10-2024

Hate speech is one of the main threats posed by the widespread use of social networks, despite efforts to limit it. Although attention has been devoted to this issue, the lack of datasets and case studies centered around scarcely represented phenomena, such as ableism or ageism, can lead to hate speech detection systems that do not perform well on underrepresented identity groups. Given the unpreceded capabilities of LLMs in producing high-quality data, we investigate the possibility of augmenting existing data with generative language models, reducing target imbalance. We experiment with augmenting 1,000 posts from the Measuring Hate Speech corpus, an English dataset annotated with target identity information, adding around 30,000 synthetic examples using both simple data augmentation methods and different types of generative models, comparing autoregressive and sequence-to-sequence approaches. We find traditional DA methods to often be preferable to generative models, but the combination of the two tends to lead to the best results. Indeed, for some hate categories such as origin, religion, and disability, hate speech classification using augmented data for training improves by more than 10% F1 over the no augmentation baseline. This work contributes to the development of systems for hate speech detection that are not only better performing but also fairer and more inclusive towards targets that have been neglected so far.

computational linguistic, information, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2410.08053

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
(12 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (0.46)
Information Technology > Services (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

Identity-related Speech Suppression in Generative AI Content Moderation

Anigboro, Oghenefejiro Isaacs, Crawford, Charlie M., Metaxa, Danaë, Friedler, Sorelle A.

arXiv.org Artificial IntelligenceSep-9-2024

Automated content moderation systems have long been used to help reduce the occurrence of violent, hateful, sexual, or otherwise undesired user-generated content online, including in online comment sections and by social media platforms [7, 19, 24]. As content is generated by AI systems, automated content moderation techniques are being applied to the text generated by these systems to filter unwanted content before it is shown to users [21, 22]. However, content moderation is known to suffer from identity-related biases, such that speech by or about marginalized identities is more likely to be incorrectly flagged as inappropriate content [5, 10, 27]. In this paper, we conduct an audit of five content moderation systems to measure identity-related speech suppression, introducing benchmark datasets and definitions to quantify these biases in the context of generative AI systems. Previous assessments of content moderation systems have used benchmark datasets to measure effectiveness and bias. These include datasets composed of user-generated content, such as tweets or internet comments, that have been hand-labeled according to a content moderation rubric [2, 8]. However, most of these datasets are composed of short-form content and do not include the types of text involved in generative AI systems, be they user-generated prompts or system-provided responses. Automated content moderation systems applied in generative AI settings may have unexpected or undesired results, for example flagging PG-rated movie scripts as inappropriate content [21]. As generative AI is increasingly used for creative and expressive text generation from schools to Hollywood, this paper is motivated by this question: whose stories won't be told?

identity group, wiki category, wikipedia, (9 more...)

arXiv.org Artificial Intelligence

2409.13725

Country:

Africa > Middle East > Egypt (0.14)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
South America > Paraguay (0.04)
(55 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Understanding the Interplay of Scale, Data, and Bias in Language Models: A Case Study with BERT

Ali, Muhammad, Panda, Swetasudha, Shen, Qinlan, Wick, Michael, Kobren, Ari

arXiv.org Artificial IntelligenceJul-25-2024

In the current landscape of language model research, larger models, larger datasets and more compute seems to be the only way to advance towards intelligence. While there have been extensive studies of scaling laws and models' scaling behaviors, the effect of scale on a model's social biases and stereotyping tendencies has received less attention. In this study, we explore the influence of model scale and pre-training data on its learnt social biases. We focus on BERT -- an extremely popular language model -- and investigate biases as they show up during language modeling (upstream), as well as during classification applications after fine-tuning (downstream). Our experiments on four architecture sizes of BERT demonstrate that pre-training data substantially influences how upstream biases evolve with model scale. With increasing scale, models pre-trained on large internet scrapes like Common Crawl exhibit higher toxicity, whereas models pre-trained on moderated data sources like Wikipedia show greater gender stereotypes. However, downstream biases generally decrease with increasing model scale, irrespective of the pre-training data. Our results highlight the qualitative role of pre-training data in the biased behavior of language models, an often overlooked aspect in the study of scale. Through a detailed case study of BERT, we shed light on the complex interplay of data and model scale, and investigate how it translates to concrete biases.

arxiv preprint arxiv, dataset, sentiment, (15 more...)

arXiv.org Artificial Intelligence

2407.21058

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.66)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback